CRAK: Linux Checkpoint/Restart As a Kernel Module
نویسنده
چکیده
Process checkpoint/restart is a very useful technology for process migration, load balancing, crash recovery, rollback transaction, job controlling and many other purposes. Although process migration has not yet been widely used and is not widely available commercial systems, the growing shift of computing facilities from supercomputers to networked workstations and distributed systems is increasing the importance and demand for migration technologies. In this paper, we describe the design and implementation of CRAK, an innovative transparent checkpoint/restart package for Linux. CRAK provides transparent migration of Linux networked applications and computing environments without modifying, recompiling, or relinking applications or the operating system. CRAK is the first system for Unix/Linux that provides transparent checkpoint/restart with the following properties: (1) it does not require any modifications of existing operating system or application code and (2) it supports migrating network sockets. Prototype implementations are available for Linux 2.2 and Linux 2.4 kernels.
منابع مشابه
Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems
The ability to checkpoint a running application and restart it later can provide many useful benefits including fault recovery, advanced resources sharing, dynamic load balancing and improved service availability. However, applications often involve multiple processes which have dependencies through the operating system. We present a transparent mechanism for commodity operating systems that ca...
متن کاملLinux-CR: Transparent Application Checkpoint-Restart in Linux
Application checkpoint-restart is the ability to save the state of a running application so that it can later resume its execution from the time of the checkpoint. Application checkpoint-restart provides many useful benefits including fault recovery, advanced resources sharing, dynamic load balancing and improved service availability. For several years the Linux kernel has been gaining the nece...
متن کاملThe Design and Implementation of Berkeley Lab’s Linux Checkpoint/Restart
Clusters of commodity computers running Linux are becoming an increasingly popular platform for highperformance computing, as they provide the best price/performance ratio in the marketplace. But while the size and raw power of Linux clusters continues to increase, many aspects of their software environments continue to lag behind those provided by proprietary supercomputing systems. One featur...
متن کاملInstant OS Updates via Userspace Checkpoint-and-Restart
In recent years, operating systems have become increasingly complex and thus more prone to security and performance issues. Accordingly, system updates to address these issues have become more frequently available and increasingly important. To complete such updates, users must reboot their systems, resulting in unavoidable downtime and further loss of the states of running applications. We pre...
متن کاملA Survey of Checkpoint/Restart Implementations
In this paper we evaluate candidates for a checkpoint/restart implementation against a common set of requirements. Overall characteristics of the two main classes of checkpoint systems, library and system, are discussed followed by specific examples from existing systems. A detailed description of two system implementations is presented. We conclude that no single publically available implement...
متن کامل